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Abstract: The principle of indirect elimination states that an algorithm for 
solving discretized differential equations can be used to identify its own bad- 
converging modes. When the number of bad-converging modes of the algorithm 
is not too large, the modes thus identified can be used to strongly improve the 
■ convergence. The method presented here is applicable to any standard algo- 

^sO . rithm like Conjugate Gradient, relaxation or multigrid. An example from theo- 

retical physics, the Dirac equation in the presence of almost-zero modes arising 
00 ' from instantons, is studied. Using the principle, bad-converging modes are re- 

moved efficiently. Applied locally, the principle is one of the main ingredients 
I of the Iteratively Smooting Unigrid algorithm. 



Q_i' 1 Introduction 

Discretized differential equations lie at the heart of many simulation algo- 
rithms in physics. A large variety of solution algorithms like Conjugate Gradi- 
ent, Overrelaxation, or Multigrid exist to deal efficiently with such problems 



T4J. The convergence of these algorithms usually depends on the condition 
number of the problem operator, i.e. the quotient of its largest and smallest 
eigenvalue. (For many simple problems multigrid methods will always con- 
verge well. Here we are not interested in such cases.) When the number of 
eigenmodes with very small eigenvalues is not large, each of these methods 
could be accelerated if an additional method for dealing with these modes 
would be applied. 

In this paper we want to study a method that can be used to do exactly this. 
It is partly based on the multigrid idea and relies on a surprisingly simple 
principle, called the Principle of indirect elimination or PIE. 

We will explain this principle in general context and then apply it to a case 
where the occurence of almost-zero modes spoils the convergence of standard 
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methods, namely the Dirac equation in a gauge field background with instan- 
tons |7jT0||. We will also show connections to an idea by Kalkreuter somewhat 
similar in spirit, called the updating on the last point [IT], and explain why 



our method is more general. Finally, we will briefly remark on the connections 
to the Iteratively Smoothing Unigrid algorithm B. 



2 The general problem 



Consider a linear operator D which may arise from a discretized differential 
equation. Here and in the following we assume D to be positive definite, if it 
were not, we could use the operator D*D instead. The general form of the 
equation to be solved is then 

D£ = f . (1) 

Let us call the lowest eigenvalue of the operator e []. Its value determines the 
criticality of the operator because the smaller it is the larger the condition 
number (quotient of largest and smallest eigenvalue) of the operator will be. 
If Eq — the problem is ill-posed because the contribution of this zero-mode 
to the solution is not determined. For small Eq standard iterative methods 
will converge only slowly, the convergence time r (the number of iterations 
needed to reduce the error by a factor of e) behaving like r oc k 2 / 2 , where k 
is the condition number of D and z is the critical exponent. This behaviour 
is called critical slowing down because the more critical the problem gets the 
slower the algorithm will be. For relaxation methods, one usually finds z ~ 2, 
Conjugate Gradient has a critical exponent of z ~ 1. An optimal algorithm 
should have a critical exponent of 0. 

At each time-step, any iterative method will yield an approximate solution £. 
We introduce two important quantities: the error e = £ — £ which is the dif- 
ference between the true and the actual solution and is of course not known, 
and the residual r = f — D£, the difference between the true and the actual 
righthandside. With these definitions we can recast the fundamental equa- 
tion ([[]) as 

De = r , (2) 

called the error equation. 

For a linear method, we can also introduce the iteration matrix S which tells 
us what the new error after the next iteration step will be, given the old one: 

e ncw = g e old _ (3) 



1 It is not fully correct to speak about eigenvalues of D. In section 2.1 we will 
explain what is meant by such a statement. 
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The concrete structure of the iteration matrix is irrelevant for the following 
discussion, 
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for examples. The important point here is that the 
iteration matrix is the reduction matrix for the error. Its eigenvalues should 
lie between minus one and one and convergence is governed by the eigenmode 
of S with absolute value of the eigenvalue closest to one. 

In the following sections we will usually assume the algorithm to be linear be- 
cause the existence of an iteration matrix eases the analysis. Nevertheless the 
method presented here could be applied to the Conjugate Gradient algorithm 
as well, see also section WJ1 



2. 1 Remark on vector spaces 



For the analysis it is important to distinguish between a vector space and its 



dual ||15|| . The differential operator D maps a vector £ 6 V to a vector in the 
dual space f G V*. To see this, consider the Laplace equation in electrody- 
namics as an example: A0 = — g. The Laplace operator maps a potential onto 
a charge density. These two objects can be regarded as dual vectors because 
there is a unique way of assigning a real number to them, namely the energy 
/ g(x)(p(x)dx. The Laplace operator therefore provides us with a bilinear form 
(4>,ijj) A = J <p(x)(Aip)(x)dx. However, there is no natural identification be- 
tween the vector space and its dual besides that given by this scalar product. 
We will later see an example where one is easily drawn to wrong conclusions 
if this distinction is not taken into account. 

It is not really meaningful to speak about eigenvectors or -values of bilinear 
forms. On the other hand, the iteration matrix of relaxational methods maps 
the error to another error and is therefore a map S : V —>■ V, possessing 
eigenvectors. It are the eigenvalues of this matrix that determine the conver- 
gence. The standard identification of eigenvectors of S with eigenvectors of D 
is done using additional structure. This is given by the matrix B which is de- 
fined through the relation S = I — Bg : D. (Standard relaxation methods arise 
from splitting the fundamental operator D = B + Co, where B is chosen 
such that it approximates D as good as possible but is "easy to invert".) B is 
an additional bilinear form and furnishes us with a scalar product in addition 
to the scalar product given by D. 

For Conjugate Gradient, the situation is similar: Conjugate Gradient updating 
steps require computations of scalar products, e.g. a = (r, r)/ (d, Dd), where d 
is the search vector. Here we need another scalar product than the D-product. 

It is therefore only correct to speak of eigenvectors of D when we have chosen 
a basis that is in some sense natural. For example, if we use the standard site- 
wise basis and find that the eigenvectors of D in this basis agree with those of 
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S, the sloppy way of speach is justified. This will be the case for the example 
we will study below. Nevertheless, in the theoretical parts of this paper we 
will be more strict. 



3 PIE in general 

After these preliminaries we formulate the 

Principle of indirect elimination (PIE): It is easier to calculate the shape of 
a bad-converging mode for a certain algorithm than to reduce it directly using this 
algorithm. 

To see this, consider the case where there is only one bad-converging mode and 
all others are reduced efficiently by the algorithm. We now use the algorithm 
to try to solve an equation of which we already know the solution, for example 
the equation D£ = 0. In this case we have £ = — e, so we know the error as 
well. Remembering equation (|3]) we see that we can now directly investigate 
how the iteration matrix acts. After n iterations we have 

£(n) = S n p) , (4) 

where ^°- ) is the initial guess we started with and is the approximate 
solution after the n-th iteration. For n — > oo S n projects onto the eigen- 
vector of S with the largest absolute value of the eigenvalue, which is the 
slowest-converging mode. For finite n the accuracy of the projection depends 
on quotient between the largest and the second-largest eigenvalue: The larger 
this is, the better the projection will be. (This can be seen easily by imagin- 
ing S to be diagonalized.) In the model case considered here, where there is 
only one bad-converging mode, this quotient will be large and so S™^ ** will 
converge rapidly against the bad-converging mode. 

If the number of bad-converging modes is larger than one, but still small, we 
can use the same technique to calculate them if we take care of orthogonalizing 
the approximations to the already known modes. By this it is obvious that 
this method will only be useful if this number is not too large, otherwise 
the calculations will take too much time. We will later comment on how the 
principle of indirect elimination can be applied locally and used to construct 
a multigrid algorithm. 

Let us come back to the case of only one bad-converging mode. If we have 
calculated this using the principle of indirect elimination, how can we apply 
this knowledge to improve convergence? 
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The answer relies on multigrid ideas and is in fact very simple. Let us call the 
bad- converging mode w. We define an operator A : R — > V, fi h- » /iw that 
creates a vector on the fundamental lattice from a number. This cumbersome 
notation has a two-fold purpose: First it stresses the similarity to multigrid 
ideas, where A would be called an interpolation operator, second it will later 
allow us to study the case where A is not exactly equal to the bad-converging 
mode w to see how this will affect the convergence. 

To solve the inhomogenuous equation, we first apply our standard iterative 
solver a few times. This will reduce all components of the error appreciably 
except for a part proportional to w: e « cw. Inserting this knowledge into the 
error equation (g) or using the fact that r = De »s Dwc we get 

D(gA) « r A*VAc « . (5) 

In other words, we have transformed the fundamental equation, living on a 
large lattice, into an equation for scalars (or simple matrices in the case of a 
gauge theory, see below). This new equation can be considered to live on a 
lattice with only one point. In multigrid language this is often called the "last- 
point lattice" as we have there a whole tower of coarser and coarser lattices 
of which the last consists of only one point. 

The equation on the last point can be solved easily to get c and afterwards 
we correct our approximation: £ <— £ + Ac. Thus we have reduced that part 
of the error corresponding to A. It is well-known from the multigrid context 
that using the largest mode of S as interpolation operator will yield the best 
convergence (Greenbaum criterion ||). If the iterative method used before 
has not been perfect, i.e. if the error still contains contributions from other 
modes, we now have to start the iteration again to act on the remaining parts. 
This may again introduce error-components proportional to A which are then 
reduced by another "last-point updating" . 

We can now understand the reason why the principle has been called principle 
of indirect elimination: Direct elimination of the bad-converging mode using 
the iterative solver does not work efficiently, but an indirect approach, first 
trying to solve an auxilliary equation and only afterwards addressing the real 
problem, works fine. 

In practice the situation will not be the idealized one described above. We 
now want to study two situations: What will the result of the correction be 
when the error is not an exact multiple of the zero-mode w, and what happens 
when A deviates from w? 

In the first case it is easy to prove that after the correction £ and A will be 
D-orthogonal even if the error before was not a multiple of A, but contained 
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an additional contribution v: 



-£ = e = cA + v 

r = De = BAc + Dv 

The equation on the last point is then: 

A*~DAx = „4*D„4c + „4*Dv 

=>- X = c + 



A*BA 

Correcting £ yields 



~_ -A'Dv 

4 A*BA V 



This gives D-orthogonality: 



A*BA 

Now we want to investigate the second question, namely how well the approx- 
imation of the zero-mode has to be. To do so we can prove the following rather 
trivial 

Theorem 1 We have an algorithm consisting of two parts. The first part 
is able to eliminate completely all components of the error except one single 
mode w, so we have e = w. The second updating then consists of an updating 
on the last point as described above using an approximation A ofw. 

We can split the bad-converging mode w into two D-orthogonal parts: 

w = Ac + v , with (A, Dv) = . (6) 

Then the iteration matrix M of the full algorithm consisting of both steps has 
the (squared) energy norm (with respect toD) 

\\M\\l= ^\ . (7) 
11 " D (w,Dw) v ; 



Proof. The energy norm is defined as 

l|M|l °=7^D^r • 

Let S be the iteration matrix of the first part of the algorithm. As it eliminates 
all parts of an arbitrary error except the mode w, it is clear that the supremum 
in the definition will be reached for £ = w. S does not affect w. After the 
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iteration only that part of w that is D-orthogonal to A will remain, see the 
calculation above. So we get Mw = v. 



Thus we have 



(Mw,DMw) (v,Dv) 



(w, Dw) (w, Dw) 



It is also useful to look at this geometrically: The angle 6 between the vector w 
and Ac with respect to the scalar product defined by D is given by 

(w,T>Ac) 
{w,-Dwy/ 2 (Ac,-DAcy/ 2 ' 

The reduction works by first projecting w onto the direction given by A and 
then taking the D-orthogonal part of this. This orthogonal part is the vector v; 
it is all that remains after the coarse-grid correction step. The length of this 
vector is given by ||v|| D = ||w|| D sin#. The reduction factor, which is equal to 
the norm of the iteration matrix, is sin 0: 




Using Pythagoras' theorem we get 



• 2 Q -\ 2 a -\ (w,D„4c) 2 

sm 2 6 = 1- cos 2 9 = 1 



(w, Dw) (Ac, BAc) ' 

and inserting the split of the vector w and again using the orthogonality 
property, we finally arrive at 

. 2 ((Ac,BAc) + (v,BAc)) 2 (w, Dw) - (Ac, BAc) (v,Dv) 

sin = 1- 



(w, Dw) (Ac, DAc) (w,Dw) (w,Dw) 



What are the implications of this theorem? First it must be understood that 
the energy norm of the iteration matrix will be equal to the spectral radius 



7 



provided the matrix S is D-symmetric, i.e. S*D = DS This is true when 
the operator D does not mix the mode w with the other modes. We can 
then regard w as an eigenmode of D because the matrix S provides us with 
an identification of the mode w with the corresponding mode in the dual 
space. Hence the energy norm directly tells us about the convergence rate 
of the algorithm. S will also be D-symmetric for standard iterative methods 
like Jacobi-relaxation and can be made so as well for SOR or Gaufi-Seidel 
relaxation. 



By choosing w as the zero-mode the theorem shows how important the correct 
treatment of this mode is. The closer the range of the interpolation operator 
A is to the zero-mode and the smaller the energy norm of the residual part 
v the better the convergence will be. (In the limiting case where the two 
are identical, the difference vector is zero and the error of the zero-mode is 
eliminated perfectly, as expected.) It is not only important that the zero- 
mode is approximated well by the interpolation operator on the last point, 
the convergence will also be better when the difference vector between zero- 
mode and the mode used for the interpolation has a small energy norm and 
is as smooth as possible. 



The theorem also serves to explain a finding by Kalkreuter [[LI]]. He found 
that it is possible to eliminate critical slowing down in a multigrid algorithm 
(actually a two-grid) for the standard Laplace equation even with interpolation 
operators that are not able to represent the zero-mode (which is a constant 
in this case) exactly, but only approximately. In the light of our theorem this 
could be understood if the difference vector has a small energy norm. This, 
however, has not been tested. 



Thus we have seen the importance of the correct treatment of the zero-mode. 
Other methods to remove convergence problems caused by the zero-mode can 
be thought of. Kalkreuter JET] proposed a simple rescaling of the approximate 
solution £ in addition to a multigrid or a relaxation algorithm to improve 
the convergence. This method completely eliminates critical slowing down in 
the simplest model problem, the Laplace equation on a two-point grid. The 
rescaling amounts to using the approximate solution £ itself as interpolation 
operator A. 



2 Actually, this is a nice example for the necesssity to distinguish endomorphisms 
and bilinear forms: treating D as an endomorphism, not as a bilinear form, we 
would transform it using the wrong relation and loose the D-symmetry property 
after the transformation: We have S*D = DS. Now if we transform both using 
the transformation for endomorphisms, we get (S')*D' = (U*S*(U _1 )*)U _1 DU! 
Here there is no cancellation as it would be with the correct transformation: 
S T 'D' = (U*S*(U _1 )*) U*DU. Only if U is orthogonal or unitary do we get the 
same cancellation. 
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The motivation for this updating scheme can be found in the following argu- 
ment: Consider again the equation D£ = f. Solving this gives £ = D _1 f = 
(Bg 1 D) _1 Bq 1 f where we have inserted a unit matrix. Let us fix the righthand- 
side and increase the criticality of the problem. The more critical it gets the 
smaller the lowest eigenvalue Eq of (Bg *D) will be. The solution £ will then 
have larger and larger contributions from the lowest eigenmode of (Bq 1 D). 
Therefore £ itself will be a good approximation to the bad-converging mode 
and can be used as interpolation operator. 

We see that this method is very similar to the principle of indirect elimina- 
tion. However, the principle of indirect elimination will always provide us with 
an approximation to the zero-mode without any contribution from a given 
righthandside, whereas Kalkreuter's method will only work well for large crit- 
icality: The interpolation operators used by the methods are A = S^ -* for 
the principle of indirect elimination and A = S n £^ + Bq n f for Kalkreuter's 
method. Even more important, the principle of indirect elimination can be 
used several times to remove more than just one mode, this is impossible with 
the other approach. On the other hand for large criticality and the case of 
only one bad-converging mode, Kalkreuter's method has the advantage of not 
needing auxilliary iterations to calculate A because £ is used. 

Kalkreuter used this method in addition to a usual multigrid or relaxation 
method. He found that there is no strong improvement for a multigrid algo- 
rithm, but for standard local relaxation the asymptotical critical slowing down 
(i.e. critical slowing down for fixed grid size and infinitely many iterations) was 
eliminated for the Laplace equation with periodic boundary conditions. This 
is what we expect for a method that treats the lowest mode of the problem 
correctly because in this case it is the eigenvalue of the second-lowest mode 
that determines the convergence and this scales with the size of the grid, not 
with the lowest eigenvalue. So for increasing grid sizes critical slowing down 
should still be present; this in agreement with Kalkreuter's results. 



4 Killing Instantons 

4-1 The Dirac operator 

Our example for a discretized differential equation with a small number of 
bad-converging modes is taken from theoretical high-energy physics, namely 
the two-dimensional Dirac equation on the lattice in a gauge field background 
with periodic boundary conditions. 

For an introduction to Lattice Gauge Theory consult . Here we only present 
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the framework: Consider a regular, (^-dimensional (hyper-) cubic lattice A with 
lattice constant a, lattice points z and directed links (z,/j). The opposite link 
is then denoted by (z + //, — //) , where z + // means the next neighbour of z 
in /i-direction. The direction index fi runs from —d to cf. Usually, the lattices 
used will be finite with an extension of L points in each dimension so that the 
number of degrees of freedom is n = L d . 

A lattice gauge theory is defined by a gauge group G which might for exam- 
ple be U(l) or SU(2). Elements of the gauge group act on a vector space V 
which for the examples above would be C and C 2 , respectively. The computa- 
tions presented below were done in two dimensions with gauge group U(l), so 
that instantons can occur. A lattice gauge field (in this case) assigns a U(l)- 
"matrix" U(z, fi) (which is simply a phase) to every link of the lattice, subject 
to the condition U(z, fi) = U(z + fi, —fi)' 1 . These matrices are distributed ran- 
domly with a Boltzmannian probability distribution oc exp(— f3Sw(U)), where 
Sw is the standard Wilson action of lattice gauge theory 

S W {U) = Tr (! - U{dp)) with U(dp) = U(b A )U(b 3 )U(b 2 )U(bi) 
v 

for a plaquette p of the lattice with links &1...&4 at its boundary. This dis- 
tribution leads to a correlation between the gauge field matrices with finite 
correlation length \ for finite (3. The case (3 = corresponds to a completely 
random choice of the matrices (x = 0), for f3 = 00 all matrices are 1 (x = 00). 
In this sense, (3 is a disorder parameter, the smaller (3 the shorter the correla- 
tion length and the larger the disorder. 

The Dirac operator acts on matter fields £ living on the nodes of the lattice. 



In Kogut-Susskind formulation JT2[ it is defined as 

( pO (z) = \ E V»* A*)* ft* + A*) - U(z, -n)* e(z - a*)) • 

/Lt=l 

Here the r)^ z = ±1 are the remnants of the Dirac matrices 7 M in the continuum. 

As the Dirac operator itself is not positive definite, we will use its square Jf) 2 
in the following. The squared Dirac has the property of totally decoupling the 
even and odd parts of the lattice; if we color the lattice points in checkerboard 
fashion, any red point is only coupled to other red points, so that we can 
restrict our attention to one of the sub-lattices. This will be especially useful 
because it lifts the degeneracy of the eigenvalues: Usually each eigenvalue of 
the Dirac operator is degenerated twice, but we can choose the eigenvectors 
to live separated on the sub-lattices. For the sake of brevity we will generally 
speak of the Dirac operator even when we mean the squared Dirac. 
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Fig. 1. The Atiyah-Singer theorem on the lattice: Shown are the five lowest eigen- 
values of the staggered squared Dirac operator in a U(l) gauge field at [5 = 10 on 
an 18 2 -lattice for different topological charges Q. Only one of the two sub-lattices 
is taken into account, on the full lattice each eigenvalue is degenerated twice. 



4-2 The instanton problem 



Many algorithms for solving the Dirac equation become problematic in the 
presence of instantons. Instantons are gauge field configurations that are topo- 
logically non-trivial but possess zero energy. Such configurations are only pos- 
sible for certain choices of the dimension and the gauge group, in two di- 



mensions instantons can occur when the gauge group is U(l), see [0 for an 
introduction. The Atiyah-Singer theorem states that at instanton charge Q 
the spectrum in the continuum will possess 2\Q\ exact zero- modes M; these 
become modes with extremely small eigenvalues on the lattice j7j. For the 
purpose of this paper it is not necessary to have an understanding of what an 
instanton is, it is only important that they are special gauge field configura- 
tions giving rise to almost-zero-modes on the lattice. Figure [l] shows the lower 
part of the spectrum of the squared Dirac operator on an 18 2 -lattice at /3 = 10 
for different instanton charges Q, taking only one of the two sublattices into 
account. Clearly the Atiyah-Singer theorem is nicely reflected on the lattice. 

When instantons are present, the condition number of the Dirac operator 
becomes very large and the problem is ill-posed. In jrO] this problem is in- 
vestigated in detail for the Parallel Transported Multigrid. In this section, we 
want to use the principle of indirect elimination to show how an algorithm 
which converges well in the absence of instantons can be adapted to a case 
with instantons. 
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The idea is very simple: If m bad- converging modes are present, use the prin- 
ciple of indirect elimination m times to calculate approximations A % to these 
modes. 

The method presented here could be applied to a Conjugate Gradient algo- 
rithm. For this algorithm, Dilger |7j has found that the number of iterations 
needed strongly increasses with the instanton charge, therefore Conjugate Gra- 
dient would benefit from the application of the method described here. 

However, we will choose the ISU algorithm on small lattices and at quite large 
values of (3 for a U(l) gauge field as an example. This algorithm has been 
described in detail in [0 . As it is in some parts based on the principle of indirect 
elimination, some remarks will be made on this method in the next section. 
For this section it is not necessary to understand how ISU works, it suffices 
to know that for the parameters chosen the ISU algorithm converges well for 
instanton charges or ±1 at large (3 but badly for larger instanton charges. The 
reason is that the algorithm in its standard form contains one interpolation 
operator on the last point (which is calculated as an approximation to the 
zero-mode) and so it is able to eliminate one zero-mode, but not more. 

In the improved algorithm one tries to solve the equation DA 1 = with the 
given algorithm. As it eliminates all other modes quickly the approximate so- 
lution will converge against a linear combination of the bad-converging modes. 
Then we start the procedure again, but now orthogonalizing the approximate 
solution to the interpolation operator we already know, doing this successively 
for all m bad- converging modes. (As the instanton charge can be easily mea- 
sured, one usually knows beforehand how many operators are needed; if one 
does not for some reason, a dynamical approach can be chosen: Simply pro- 
ceed calculating the next interpolation operator until the convergence rate of 
the trivial equation becomes good enough.) 

The overall work for this procedure is proportional to the square of the number 
of bad-converging modes, as is the work of actually applying the interpolation 
operators to eliminate them. (The number gets squared because of the need to 
calculate an effective operator on the last point layer. However, the effective 
operator only needs to be calculated once for each configuration.) This restricts 
the method to cases where the number of bad-converging modes is not too 
large, which usually is the case for instanton charges. 

Figure ^| shows the performance of the usual ISU method compared to the 
improved version for the Dirac operator in a U(l) gauge field with different 
instanton charges. We measured the asymptotic convergence time, i.e. the 
number of iterations asymptotically needed to reduce the error by a factor 
of e. The improved version of ISU used a number of interpolation operators 
on the last point equal to the instanton charge which equals the number of 



12 



- O Simple scheme 






- ▲ Improved scheme 

[ * 






; * a 


A 



topological ohorqe Q 



Fig. 2. Performance of the standard and the improved ISU algorithm for the elimina- 
tion of instanton modes. The data were generated on one sub-lattice of an 18 2 -lattice 
at P = 10 with a U(l) gauge field. The improved algorithm uses Q interpolation 
operators on the last point to eliminate the almost-zero modes. (The number of 
configurations evaluated for the different topological charges was 217, 113, and 22. 

) 

bad-converging modes. The data were generated on one sub-lattice of an 18 2 - 
lattice at f3 — 10. For this high value of (3, the convergence in the absence 
of instantons is good, as can be seen from the value at Q = 1. The standard 
method works well for instanton charge or 1, as explained above, and its 
sensitivity to higher instanton charges is striking. The improved method shows 
no dependence on the instanton charge, the convergence is good in all cases. 
Note also that the standard deviation is much higher for the usual method 
because it is affected by fluctuations in the eigenvalues of the bad-converging 
modes. 

Clearly the improved method is superior — the cost of calculating the instanton 
modes is about 10 iterations for each instanton plus the cost of the orthogo- 
nalization, whereas the saving in the solution of the final equations is of the 
order of hundreds of iterations depending on how much we want to reduce the 
residual. 



5 PIE and ISU 

We have seen above that the principle of indirect elimination will only be 
helpful when the number of bad-converging modes is not too large. In the 
case of simple relaxation methods, however, this number is of order 0(n), so 
storing them would cost 0(n 2 ), where n is the number of degrees of freedom 
in the system. So it seems that the method is useless in such cases. In this 
section we want to explain how the Iteratively Smoothing Unigrid algorithm 
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(ISU) presented in |2| can be regarded as the local application of the principle 
of indirect elimination. We will only present the basic idea here. Some famil- 
iarity with the basic multigrid idea is assumed in this section, see [|]|)]|| f° r 
introductions. 

We can associate a length scale (e.g. a wavelength) with each modes of our 
system. Because they are local, relaxation methods eliminate all those modes 
corresponding to a length scale of the order of one lattice spacing. Usually there 
will be 0(n/2) of these. Of the remaining modes (D(n/4) will be associated 
with length scale 2a (a is the lattice spacing), 0(n/8) with scale 4a and so 
on. So there will be many bad-converging modes with a small length scale 
and only a few corresponding to a large scale. The ISU algorithm is a method 
to calculate interpolation operators that are able to span the space of these 
modes. These operators are restricted to parts of the lattice, the size of the 
domain being determined by the scale of the mode that is to be approximated 
by the operator. 

To be more specific, let us start with the smallest scale 2a. As in usual multi- 
grid methods, we divide the hypercubic lattice into (overlapping) hypercubes 
or blocks [x] of side length 3. Then we try to solve the equation D|r x ].AM(2) = 
using a relaxation method. Here D^] is the restriction of D to the block [x] 
using Dirichlet boundary conditions. What remains after a few iterations will 
be the slowest-converging mode on this scale and can therefore be used as in- 
terpolation operator on the first block-lattice. Repeating this for all the small 
hypercubes, we know the shapes of the bad-converging modes on scale 2a. Now 
we do the same on the next scale, dividing the lattice into larger blocks (of side 
length 7, agreeing with the formula 2 J 1 — 1). Again we try to solve the equation 
D | [3.1^^1(2) = 0, where [x] now denotes the larger blocks. The important point 
is that we use the interpolation operators on the smaller scale that are already 
known for this calculation to eliminate contributions from the bad-converging 
modes on the smaller scale. In this way we proceed to larger and larger hyper- 
cubes, always using the interpolation operators already known. This method 
would only fail if a large number of bad-converging modes lived on a large 
length-scale. 

It has been found that this algorithm is able to eliminate critical slowing down 
completely for the case of the two-dimensional Laplace equation in an SU(2) 
gauge field background at arbitrarily large values of the gauge field disorder 
and the lattice size. An improved version has been shown to do the same 
for the two-dimensional squared Dirac equation, except for extremely large 
disorder {(3 ~ 2 or smaller). See for details. 

An idea that is similar in spirit to the principle of indirect elimination has been 
discussed in 0, section 4.6]. Brandt proposes to do relaxations on the funda- 
mental lattice with arbitrary starting vectors to determine "typical shapes of a 
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slow-to-converge error" which could then be used to determine good multigrid 
interpolation operators. Unfortunately, this idea suffers from a severe disease: 
The number of modes that converge badly under simple relaxation is huge 
(about half of the number of grid points). What one will get by this procedure 
is a mixture of low-lying eigenmodes with contributions depending on their 
eigenvalues. The time needed to arrive at a function that consists only of the 
lowest eigenmodes will be proportional to the lattice size, so the method will 
not work without critical slowing down. 

The difference to the ISU algorithm is that this is a so-called unigrid method. 
It allows for interpolation operators living on different length-scales, whereas 
standard multigrid algorithms only use interpolation operators living on small 
domains. On each length scale we need not represent all modes that converge 
badly on this and on all higher scales; only the modes that belong to the 
scale corresponding to a certain lattice constant have to be dealt with. The 
next-coarser length-scale will then take care of the modes corresponding to 
this scale and in their computation the smaller scales are already taken into 
account. 



6 Conclusions 



We have presented a simple method to improve the convergence of solution 
algorithms for discretized differential equations when the number of bad- 
converging modes is small. The principle of indirect elimination used to do 
this is based on the general idea that an algorithm can be used to identify 
its own bad-converging modes. Conceptionally, the method is similar to the 
general idea of accelerating algorithms described in pi: One tries to find out 
what the slow modes of the algorithm are and uses this knowledge to improve 
the algorithm. For example, multigrid methods are based on the fact that the 
slow modes are the smooth modes that can be obtained by smooth interpola- 
tion. The principle of indirect elimination serves to automatize this process in 
the case when the number of slow modes is small so that it suffices to know 
them without doing further analysis of their structure. 



The method has been studied for the case of the Dirac equation in a gauge field 
background with instantons and worked extremely well. Applying it locally 
leads to a multigrid method called the Iteratively Smoothing Unigrid. 
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